February 1996

Designing multilingual applications using Developer/2000 and Oracle7

By Hitesh Sheth

This article is the second in a series dealing with Oracle's globalization capabilities. It provides an overview of the key issues involved in creating multilingual applications using Developer/2000 and Oracle7.

Deploying multilingual applications in today's business environment is more of a necessity than a choice. Any business seeking to expand its market will inevitably have to look beyond the shores of its home country. This usually means dealing with new languages and cultures. This also means that you'll have to globalize your internal information systems and/or the software you want to sell.

The good news is that Developer/2000 and Oracle7 provide you with a robust set of features (support for 42 languages and 150 code sets) to quickly enable your applications to be national language support (NLS) compliant. However, there are issues that you as a developer need to address when building your application. The issues fall into four main categories:

Language adaptation

How do you create an application that can adapt to your end users' language and territory? You can do so automatically by using Oracle, which allows all language-dependent features to operate at runtime according to the NLS environment specified for each end-user. By taking advantage of this capability, you can greatly simplify the design of a multilingual application compared to hard-coding such features as part of the application design itself.

The parameter that controls the language settings is called NLS_LANG, which has the syntax

<language>_<territory>.<character encoding>

In addition to other features, this parameter controls the default formatting of numbers and dates, e.g., a user could set the NLS_LANG parameter to French_France.WE8ISO8859P1 and execute a simple query by issuing the following statement:

$setenv NLS_LANG French_France.WE8ISO8859P1

select ename, hiredate, round(sal/12,2) sal from emp;

The results would be:

ENAME HIREDATE SAL

MÅller 01/04/89 3795,83

Hîscht 10/05/90 2933,33

HÇläne 01/11/91 4066,67

Another user could specify a different NLS_LANG value, e.g.,

$setenv NLS_LANG American_America.WE8ISO8859P1

select ename, hiredate, round(sal/12,2) sal from emp;

in which case, the default formatting with the same query would change to:

ENAME HIREDATE SAL

MÅller 01-APR-89 3795.83

Hîscht 10-MAY-90 2933.33

HÇläne 01-NOV-91 4066.67

This automatic adaptation is transparent to the application design in the sense that it's transparent to the SQL statements used. If specific formats are required, you can use language-independent format masks to achieve the same degree of transparent adaptation, as follows:

NLS_LANG

American_America French_France

to_char(<date>,'DD/Mon/YY') 17/Feb/93 17/Fev/93

to_char(<num>,'99G999D99') 74,195.83 74.195,83

The number of characters in a default date format can change for different <territory> conventions, hence your application design does need to ensure that they are correctly displayed for the range of <territory> values the application supports. This is not normally an issue with number formats, since only the characters used for the radix and group separator change.

Your multilingual application must avoid the use of language-dependent date and number literals in SQL statements, such as

create view staff as select * from emp

where hiredate > '1-JAN-89'and sal > '69999.00';

Such a view evaluates correctly only if the user's NLS environment is compatible with the date and number formats. Therefore you should instead use SQL statements with language-independent formats such as

create view staff as select * from emp

where hiredate > TO_DATE ('1-JAN-89',

'DD-MON-YY', 'nls_date_language = American') and sal > TO_NUMBER ('69999.00', '99G999D99',

'nls_numeric_characters = ''.,''');

This view evaluates correctly independent of the NLS environment in force for the user session.

The 7.2 Server release will support additional calendars, e.g., the Japanese Imperial and Arabic Hijrah. Wherever these use fixed formats, they'll override the Gregorian format mask specified. Hence, while it will be possible to build an application that adapts to different calendars for input and display of dates, you'll need to take into account differences in length in the application design.

Adaptation also applies to sorting sequences. Each <language> specifies a default sort sequence used in ORDER BY queries, for example

$setenv NLS_LANG German $setenv NLS_LANG Swedish

select letter from letters select letter from letters

order by letter; order by letter;

LETTER LETTER

a a

- b

b z

z -

Again, the change in sort sequence is transparent to the application design. In addition to setting the NLS environment prior to logon, you can also change it during a session using the ALTER SESSION statement, as shown here:

alter session set

NLS_DATE_LANGUAGE = German

NLS_DATE_FORMAT = 'DD.MON.YY'

NLS_NUMERIC_CHARACTERS = '.,';

In the 7.2 Server release, you can specify all NLS parameters in addition to NLS_LANG to provide greater flexibility in defining the desired combination of language and territory conventions prior to logon.

Catering for different character encoding schemes

Since you're creating a client/server application, two character encoding schemes are involved in a user session: the encoding scheme being used by the input/output device (e.g., character mode terminal or PC running Windows) and the encoding scheme used to store data in the Oracle7 database. You define the former with the parameter NLS_LANG. You define the latter when you create a database with the CREATE DATABASE command. The two encoding schemes can be different and can differ for different clients. The Oracle software makes any required conversion of data between encoding schemes and is transparent to the application design.

When you use different encoding schemes, it's normally essential to ensure that they're compatible. If you don't define a character in the source data in the target encoding scheme, you must use replacement characters. You define these as part of the specifications for the target encoding scheme, and you can define replacements on a per-character basis, if relevant. If not, you use a default character (usually a question mark). To ensure a complete conversion, the server encoding scheme must contain all characters used in the source data.

Conversion is always possible between any two single-byte encoding schemes, between two multi-byte schemes, and between a single-byte and a multi-byte scheme for a limited set of combinations. However, in converting data between different multi-byte schemes, the string lengths may change, and this may impact application design.

Oracle7 provides support for UTF-8, a variable-width encoding of Unicode. This support allows you to create a multilingual repository and at the same ensures that the server character set is a superset of the client character set.

Processing multilingual data

The range of languages that your multilingual application can support is inherently limited to the range of languages supported by the character encoding scheme being used. For data input and output, this range is determined by the input/output device; its hardware and software must allow you to use an appropriate encoding scheme according to the application requirements. For example, for west European languages, many UNIX systems support ISO 8859/1. Many computer manufacturers also support their own schemes that were devised to support the same language group, e.g., DEC Multinational, HP Roman8, IBM PC Code Page 850, and EBCDIC Code Page 500. However, such single-byte schemes can support only a limited range of characters, and you must use other schemes for east European languages and non-Latin based languages such as Greek and Russian. Multibyte schemes can support many more characters, but they were principally designed to support a specific Asian language (for example, Japanese, Korean, or Traditional or Simplified Chinese). They also support only a limited range of languages. All encoding schemes support the basic Latin alphabet (a to z).

Where the required range of languages falls within the scope of a single encoding scheme, the choice of encoding schemes should be straightforward. The input/output device determines the encoding scheme used by the client application (which is made known to the Oracle software via NLS_LANG). If this scheme is the same for all users, it's also the natural choice for the encoding scheme for the database. Different users can use different encoding schemes, in which case the database encoding scheme should be chosen as the most relevant (in order to minimize overall data conversion). The only restriction in the choice of a database encoding scheme is that an EBCDIC-based encoding scheme can't be used for a database on an ASCII-based server, and vice versa. As we discussed, where different encoding schemes are used, it's normally essential to make sure that their character repertoires (the list of characters defined in a specific encoding scheme) are compatible in order to ensure that replacement characters are not used during conversion.

User interface layout

When designing the user interface for a multilingual application, keep in mind these points:

You can ensure that your application can be easily translated by not taking up all the real estate on the screen (a common mistake). Note that when you translate from English into a different language, the string lengths typically grow by an average of 30 percent. In addition, it's important that text is not hard-coded into bitmaps since translating such bitmaps will essentially mean creating new bitmaps.

Many of the common assumptions made about color, sound, bitmaps, etc., don't necessarily apply to different cultures. Keep bitmaps as simple as possible, and avoid symbols unique to one culture. Sounds and colors can take on a whole new meaning depending on which country you're the running the application in.

For users deploying applications into the Middle East and North Africa, both Developer/2000 and Oracle7 provide support for bi-directional languages. Developer/2000 specifically provides support for controlling:

Conclusion

Multilingual applications have several benefits compared to maintaining separate versions of applications for each language. By taking advantage of the key features of a transparent runtime adaptation of language-dependent behavior and addressing the issues raised in this article, you can rapidly deploy a multilingual application.

Hitesh Sheth is Manager, NLS Product Management, Oracle Corporation.


[Return to Index for Exploring Oracle Developer/2000 and Designer/2000 - February 1996]

Copyright (c) 1996 The Cobb Group, a division of Ziff-Davis Publishing Company. All rights reserved.

Reproduction in whole or in part in any form or medium without express written permission of Ziff-Davis

Publishing Company is prohibited. The Cobb Group and The Cobb Group logo are trademarks of

Ziff-Davis Publishing Company.

Exploring Oracle Developer/2000 and Designer/2000 is a publication of The Cobb Group.
1-800-223-8720